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[7) generalized Good's algorithm to allow arbitrary fac- 
torization of the dimension of a single-variate transform. 
Their report of dramatic time savings has stimulated wide- 
spread interest in the fast Fourier transform. 

The Cooiey-Tukey fast Fourier transform algorithm 
can be used with any value of n, and the required number 
of arithmetic operations is proportional to n times the sum 
of the factors of n. The. case of n = 2 m is of special interest 
because of the ease of programming and is sufficiently 
general for most applications in which the user is free to 
choose n. Computing time for n a power of two increases 
as 

7i logs u = nm 

and roundoff errors increase slowly. Even for small n, this 
algorithm is faster and more accurate than previous 
methods, and it permits solution of large problems that 
would before have required an impractical amount of 
computing time. 

In this paper, we consider several topics related to the 
fast Fourier transform. Reviewed first is the fast trans- 
form algorithm for n a power of two, giving shortcuts for 
computing the needed trigonometric function values, and 
showing a convenient method of transforming real- 
valued data. We then consider computing the fast trans- 
form on a system with virtual memory and rearrange the 
transform steps to reduce the number of memory overlay 
operations. The approach given is applicable to the mixed 
radix transform as well as the radix two transform. A 
scaling plan for computing the fast transform with fixed- 
point arithmetic is also given. 



i represents the imaginary unit V — 1. In matrix nota- 
tion, we can represent this transformation by 

X = Ta 

where T is the n X n matrix of complex- valued exponen- 
tials 

fa - exp (i2irjk/n) 

— cos (2irjk/n) + % sin (2vjk/n) 

for;, k = 0, 1, • • -,7i - 1. 

The inverse transform 

^ n-l 

<Xk =* - £ xj exp ( —i2Tcjk/n) 
n /«o 

for k =b 0, 1, • • n — 1 

can similarly be represented by 

a = i T\ 
n 

where T* is the complex conjugate of T; i.e., 
t% = exp (—i2icjk/n) 

= cos (2-xjk/n) — i sin (2wjk/n) 

for;, k - 0, 1, • *->n — 1. 
The matrices T and have the relation 

TT* « T*T - nl 
where / is the n X n identity matrix. 1 
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c* ss exp (i2irfr/8) 
Fio. 1. Example of fast Fourier transform algorithm for n - 8 



The Fast Fourier Transform 

The fast Fourier transform algorithm is a method of 
computing the finite complex Fourier transform 

n-l 

Xj = «* (i2rjk/n) for j = 0, 1, ■ * • , n — 1, 

A— 0 

where a and x are complex-valued vectors of length n, and 



If n has m factors n x , n s , • • - n m , where n = 11/ n; > 
the Cooley-Tukey fast transform algorithm decomposes 
the transformation T into the product of m elementary 



1 When using a program not having the complex conjugate 
transform as an option, one can compute T*x by forming the 
complex conjugate of x t transforming by T t then taking the com^ 
plex conjugate of the result, i.e., (TV)*. 
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transformations, followed by a permutation P of the 
result. 2 In matrix notation, T can be represented by 

T - PS 

Each step 8, is in turn composed of n/nj transforms of 
dimension n, . Since the number of arithmetic operations 
for a transform of dimension n } - is of the order of n, 2 , the 
total number of operations for the transformation S is of 
the order of n X)i • The permutation P adds compara- 
tively few additional operations. 

Consider now the case n = 2 m . Each step Sj consists of 
computing n/2 transforms of dimension 2. In the trans- 
formation Si , pairs of elements n/2 apart in the complex 
data vector are transformed. In S 2 , pairs of elements n/2 
apart in the upper and lower halves of the data vector are 
selected. In Si , pairs of elements n/4 apart within each 
quarter of the data vector are selected, and so on until 
pairs of adjacent elements are selected in the transforma- 
tion S m . These steps are illustrated in Figure 1 for % = 8. 
We defer for the moment discussion of the sequence of 
exponential multipliers used and consider first the per- 
mutation P of the transform result. 

To bring the results into normal binary order, we must 
interchange pairs of entries according to the following rule: 
We represent an index j as 

3 - im-12^ 1 + j m -i2 m ~* + • • ■ + ji2 l + jo ' 

and associate with it the reverse binary representation 

ry = jo2 m - 1 + ji2' n - 2 H + im-*2» + jm-l J 

if r, > j, we interchange entries r$ and j. Note that if 
Ti = y, as is true, for example, for; = 0, n — 1, n/2 - 2, 
n/2 + I, • • , thejth entry remains fixed. The permuta- 
tion matrix P of this interchange is symmetric and PP = /, 
i.e., permuting twice gives an identity permutation. Thus, 
a computing procedure for permuting from reverse binary 
to normal order can also be used for permuting data from 
normal to reverse binary order. 

In computing the fast Fourier transform, we can al- 
ternatively permute the data first, then compute the 
transform. Thus 

a 8 Tx = PSx 

= (PSP)Px 

= (PS m P)(PS^P) t • • • , (PSiP)Pz. 

Whereas in Sj , we selected pairs of elements 2" 1 -'" apart, in 
(PSjP) } we select pairs 2'- 1 apart. Thus in (PSiP), we 
select adjacent pairs of data elements and in PS m P we 
select pairs of elements n/2 apart. In some applications, 



1 We assume throughout this paper that the transform is com- 
puted in place with the results of each transform of dimension n/ 
replacing the previous values. 
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we will wish to have computing procedures for both trans- 
formations S and (PSP) to avoid, the need for permuta- 
tion. We can then transform a data vector with S, modify 
the result, and transform back with (PS*P). 

In computing the radix 2 transform in place, the schedule 
of pans of data elements used depends on the original 
ordering of the data. If the data are in normal order, we 
start with pairs n/2 apart in the first step and end with 
adjacent pairs in the last step; the result is left in reverse 
binary order, If the data are in reverse binary order, wc 
start with adjacent pairs in the first step and end with 
pairs n/2 apart in the last step; the result is then in normal 
order. 

We now consider the sequence of complex exponential 
multipliers used in the radix 2 transform, 9 or, equivalently, 
the required sequence of pairs of sine and cosine values. 
The example in Figure 1 is based on the form of the fast 
transform algorithm proposed by Cooley and Tukey [7], 
If we consider the n = 16 case, the required multiples of 
2tt/16, if we take the eight pairs of data elements in order 
according to the lower index member of each pair, are: 

0, 0, 0, 0, 0, 0, 0, 0 1st step 

0, 0, 0, 0, 4, 4, 4, 4 2nd step 

0, 0, 4, 4, 2, 2, 6, 6 3rd step 

0, 4, 2, 6, 1, 5, 3, 7 4th step. 

We note that if the data elements are indexed in normal 
order, the trigonometric function values are used in re- 
verse binary order. For the corresponding transform 
(PSP) with the data originally in reverse binary order, the 
required multiples of 2jt/1(> are: 

0, 0, 0, 0, 0, 0, 0, 0 1st step 

0, 4, 0, 4, 0, 4, 0, 4 2nd step 

0, 2, 4, 6, 0, 2, 4, 6 3rd step 

0, 1, 2, 3, 4, 5, ft, 7 4th step.. 

In this case, trigonometric function values can be gener- 
ated in normal sequence as needed. However, for efficient 
use of these values, we need to step through the data 
array more than once in all but the first and last steps. 
This can be a disadvantage if we are using a computing 
system allowing automatic overlay of data storage. 

We can invert the transformation S by using the trans- 
formation l/n(PS*P)> since 

r «. I T*T m I (PS*) (PS) - i (PS*P)S. 
n n n 



» This topic is discussed by Gentleman and Sande (81 for the 
mixed radix case. 
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Alternatively, we can express tlie inverse as: 



TABLE I. Extrapolated Values of cos t/2 ani> 1 -sin tt/2 
for Two Difference Equation Methods 



the product of the inverses of the m steps Sj in the trans- 
form S. Considering an elementary dimension 2 transform 

y s - x } + Xk exp (iO) 

y k = Xj - x k exp (id) 

in *S, we .see that its inverse is 

*/ i(Vi + Vk) 

*h = Ky/ ~ y*)exp (-*). 

Using this modified arithmetic, wc can backtrack through 
the steps of transform *S. Considering again the first n = 16 
example above, we now use the trigonometric function 
sequence of the 4th step, the 3rd step, the 2nd step, then 
finally the 1st step. 

Similarly if our data are in normal order, we can back- 
track through the steps of the transform (PSP). Con- 
sidering the second n = 16 example above, we now use the 
trigonometric function sequence of the 4th step, the 3rd 
step, the 2nd step, and finally the 1st step. Sande [8] has 
proposed using this organization of the fast transform and 
points to the advantage of using the trigonometric func- 
tion values in normal order. 

The author has coded Algol radix 2 transforms based 
on the (PSP)" 1 organization for data in normal order and 
based on the (PSP) organization for data in reverse binary 
order [9] for use on a system with fixed memory allocation. 
The S and S~ x organizations are used in a later section of 
this paper as the basis of a method to be used on a com- 
puter system with memory overlay; the advantage of 
being able to use each pair of trigonometric values on a 
single sequence of consecutive data points outweighs the 
disadvantage of generating them in reverse binary order. 

Computing Trigonometric Function Values 

Counting one sine and cosine value for each 2X2 
transform, the fast Fourier transform of n = 2 m data points 
requires nearly 2n trigonometric function values. In most 
methods of computing the fast transform, we can reduce 
this number to n by using the relations: 



cos 



sin 



(i 



^ = -sin (6) 
^ = cos (0). 



We may further reduce the number to n/2 by using similar 
relations for the angles r/2 — 6 and ir — 0. If we complete 
each of the m transform steps before going on to the next, 
further reduction in the number of trigonometric values 
requires storing some values for later use. 

If sufficient high speed storage is available, we can table 
the values 

sin (2-irj/n) for j - 0, 1, • • - ,n/4; 
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this table contains all of the values needed in the transform 
computation. The entries can be quickly filled in by a 
binary process. We first enter the values for 0, w/4, and 
ir/2. On the next step, the values for tt/S and 3tt/8 are 
computed, using the relation 



sin ((Jb + 1)0) = 



ain (Jed) + sin ((fc + 2)0) 
2 cos 6 



where 6 = w/8. At each succeeding step the number of 
values is doubled until the table is complete. If the values 
are used in reverse binary sequence in computing the fast 
transform the table may then be permuted for con- 
venience in indexing. 

Another approach is to generate the trigonometric 
function values for successive multiples of 2ir/n by differ- 
ence equation. Since we extrapolate from initial values, 
care in selecting a method is needed. We show two methods 
that give satisfactory accuracy. In the first, we generate 
cosine and sine values as independent sequences, using the 
second difference relations 

CWi = R X cos (he) + C k 

cos ((A + 1)6) « cos (hS) + CV 

S k +i = BX sin (fed) + St 

sin ((k + 1)0) = sin (kd) + S* f t , 

where the constant multiplier is 

R = -4 sin' (6/2) 

and the initial values are 

Co = 2 sin 2 (6/2) 

So = sin (d) 
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cos (0) - 1 
sin (0) = 0. 

Another method, in which the sine and cosine* are com- 
puted as a pair with four multiplications instead of two, 
is as follows: 

cos ((k + 1)0) 

= [CX cos (kg) - S X sin (k$)] + cos (kg) 
sin ((k + 1)6) 

= [C X sin (Afl) + SXcos (to)] + sin (Atf), 

where 

C - -2 sin 2 (0/2) 
5 » sin (0). 

The two methods were tested using initial values calcu- 
lated by the library trigonometric function procedure on a 
Burroughs B5500 computer, then values for cos n/2 and 
sin tt/2 were computed by extrapolation. Results are given 
in Table I. The computer used has a 39-bit floating-point 
mantissa, with octal exponent; thus for a number between 
1/8 and 1, a change in the low order bit gives a difference 
of 1.8 X I0~ u . Both methods result in alow level of errors, 
but when used in a Fourier transform program, the second 
method often yields about one binary place more accuracy 
than the first. 

Transforming Real Data 

If the data vector to be transformed is real-valued, one 
can augment the data with an imaginary component 
vector of zeros and then use a subroutine for computing 
the complex Fourier transform. The resulting Fourier 
coefficients a* for k = 0, 1, • • • n — 1 have a complex con- 
jugate symmetry about n/2, thus 

a n -k = a** for k = 1, 2, • • • n — 1, 

As pointed out by Cooley [10], this symmetry allows us to 
transform simultaneously two real vectors x and y and 
separate the results. We compute the transform 

X = I T(x + iy) . 

- i Tx + i i Ty 
n n 

= \ (a + iff) 

where « and /3 are the real Fourier transforms of x and y, 
and then recover a and 0 by 

JXo + Xo* for k = 0 

"* " W + Xj-jt for A; - 1,2, •••,n/2, 

fi(Xo* - Xo) for A; - 0 



Bingham, Godfrey, and Tukey (11] point out that this 
technique can be \ised to transform a single sequence of 
real data points if n is divisible by two. Alternate data 
points are stored in the real and imaginary components of 
the complex vector to be transformed. After transforming, 
permuting to normal order, and separating the Fourier 
coefficients of the two sequences, the two coefficient sets 
are combined in a final step. In using this method, two 
reorderings are needed if the real data are initially in a 
single sequence, one before the transform and another 
after. For n a power of two, the final permutation from 
reverse binary io normal order can be done in place in one 
step by pan* interchanges. However, the initial permuta- 
tion is not as easily done in place.* Here we propose a 
simple solution to this problem. 

Let us suppose that the original data are stored in 
normal sequence, with the first n/2 entries in the real 
vector A and the second n/2 entries in the imaginary 
vector B of the transform. We then in two steps permute 
the data so that the even numbered entries are in A and the 
odd-numbered entries in B t both in reverse binary order. 
If wc then compute the transform with a procedure operat- 
ing on complex data originally in reverse binary order, the 
transformed result is in normal order with no further per- 
mutation needed. In the first step, we interchange entries 

A M and Bj fori - 0, 2, . • (n/2) - 2, 

which leaves the even-numbered entries in A and the odd- 
numbered entries in B, In the second step, we permute A 
and B independently to reverse binary order, ignoring the 
low-order bit, as follows: for; = 0, 2, • • • (n/2) - 2, let k 
be the reverse binary value oij/2; then if k > (j/2) f inter- 
change Ay, and Aj and interchange A ik +i and A;+i . The 
same permutation is applied to the B vector. The two steps 
for N = 16 are shown in Figure 2, where the numbers 
listed are the original indices of the data. It should be noted 
that the order of doing the two steps of the permutation 
may be interchanged if desired. 
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Fig. 2. Permutation of real data before Fourier transform 

The inverse operation of evaluating a real Fourier series, 
given a set of cosine and sine coefficients, can be done by 

4 We assume that the real and imaginary components of complex 
data are stored in separate arrays. If interleaved, as with some 
FORTRAN compilers, the problem considered here does not exist. 
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backtracking through the steps described above, comput- 
ing the inverse of each. While this is usually the moat con- 
venient method, another method offers advantages in 
computing convolutions. If we transform two real se- 
quences -of. length n to the frequency domain and form the 
product a/3* of the two transforms, the resulting n complex 
values are the Fourier coefficients of the convolution, a 
real sequence of length n. Thus, the real component a is an 
even function, the imaginary component ib is an odd 
function, and Ta and T(ib) are both real. Since 

T (a + *). - Re [T(a - b)} + Im [T(a - 6)], 

we can subtract b from a, transform, then add the real and 
imaginary components to get the convolution. The ad- 
vantage of this approach is that we can perform all of the 
frequency domain operations with the data in reverse 
binary order, reducing the intermediate result to a complex 
vector of length n/2 at this stage before transforming to 
the time domain. With the result in normal order, we then 
separate the transforms of the even- -and odd-numbered 
entries and combine them, using the aJgorithm for com- 
puting Fourier coefficients described above. A final addi- 
tion then gives the convolution. An Algol convolution 
procedure based on this approach is available 19]. 

Methods for Use on a Virtual Memory System 

In using the fast transform algorithm on the Burroughs 
B5500 computer, we have frequent need to transform data 
sets of a size exceeding the 2 1S words of core storage. One 
solution, using auxiliary memory for datastorage, has been 
explored previously [12]. Here we consider use of the com- 
puter's virtual core memory feature. 

Virtual memory and multiprogramming are standard 
features of the B5500 computer system. Data and pro- 
grams are stored in variable-length segments of up to 1023 
words. In Algol, array rows correspond to data segments, 
and for large radix two transforms, we store data in two- 
dimensional arrays with rows of 512 words in length. When 
a program refers to an element of an absent array row, the 
operating system takes control and finds core space for the 
row. Space occupied by program segments not currently 
in use is taken first, then data segments are copied to disk 
storage generally on a first-in, first-out basis, without re- 
gard to the program to which they belong. Then an input 
operation reads the desired row into core and control is 
returned to the user program. This organization places a 
premium on doing as much computing as possible with a 
small set of array rows before going on to others. Memory 
overlay operations begin when the total storage require- 
ment of the programs in the "mix" exceeds core capacity. 

The Sande version of the fast transform, while very 
efficient for transforming a single segment of data, is highly 
vulnerable to slowdown through memory overlay. A com- 
plex transform of size n = 2 13 can be computed without 
difficulty on the B5500 if it is run without multiprogram- 
ming, but the addition of another program of comparable 
size will slow the transform program almost to a halt. 



Fortunately, the Cooley version will run under the same 
conditions with only a moderate increase in computing and 
input-output channel time. In this section, we show a rear- 
rangement of the Cooley version giving a more efficient 
schedule of data accesses. With this arrangement, input- 
output channel time caused by memory overlay increases 
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Fig. 3. Computer and input-output channel times for the fast 
Fourier transform on a virtual memory system with 2" words of 
core storage 
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slowly as core capacity is exceeded. In Figure 3, we show a 
plot of computer and input-output channel tune for a 
radix two transform based on this arrangement. 6 

In the first step of the radix two fast transform algo- 
rithm, data pairs are taken in serial order from the full 
data set, one element from the lower half and the other 
from the upper half of the data vector. However, after the 
first step in the transform, the two halves of the complex 
data arrays are independent. Similarly, after the second 
step, the four quarters are independent, and so on. Thus, 
after the first step, we can complete the remaining m — 1 
steps in the first half of the data before going on to the 
second half. And after the second step in the first half, we 
can complete the remaining m — 2 steps on the first 
quarter of the data before going on to the second quarter. 
Continuing in this manner, we can compute the first pair 
of entries in the final result, having computed only the 
necessary intermediate results, then compute the second 
pair of entries in the final result, and so on. The sequence 
of steps for the n = 16 case is shown in Figure 4(a). For 
most of the computing, the needed subset of the data is 
small enough to fit within the high speed memory of the 
computer. 

In this method, we use the arithmetic of the Cooley 
vei*sion of the fast transform, thus the angle 6 of the expo- 
nential multiplier remains constant within each of the 
numbered sections in Figure 4. When a segment splits into 
two halves in moving to the adjacent transform step, the 
angle for the lower half is 0/2 and the angle for the upper 
half is (0 + tt)/2; a single sine-cosine pair serves for the 
two halves. We can use the library trigonometric function 
procedure to compute sin (0/2), then calculate cos (6/2) 
by the relation 

where sin (0) is remembered from the previous segment. In 
this way, the trigonometric function library procedure is 





m times. 




0 


0 


0 


8 


i 


1 


4 


4 


2 


12 


5 


3 


2 


2 


4 


10 


3 


5 


0 


0 


6 


14 


7 


7 


1 


8 


8 


9 


9 


9 


5 


12 


10 


13 


13 


11 


3 


10 


12 


11 


11 


13 


7 


14 


14 


15 


15 


15 


1- 


P.-J— 


— pH 



Fio. 5. Steps of reverse binary to binary permutation 

1 In an alternate version of this program, using radix eight arith- 
metic for the final stepa within a single real-imaginary pair of 
array rows, computing time is reduced by one-third, while input- 
output ohannel time remains unchanged, 



When we start with data in reverse binary order, a* 
similar method can be used. We do the first step of the 
transform on the first pair of data entries, then on the 
second pair. Before going on to the third pair, we first do 
the second step of the transform on the first four entries. 
After each initial pair is transformed by the first step, we 
proceed to all possible subsequent steps. The sequence of 
stepsforn = 16 is shown in Figure 4(b). The trigonometric 
functions are computed as above, except that we store a 
list of the next sine cosine pair for each of the m steps. 

The author has written Algol procedures for computing 
the fast Fourier transform on a virtual memory system, 
both for the radix two case [13] and for the mixed radix 
case [14]. In the mixed radix case, after doing a step of the 
transform with a factor n,- on a portion of the data array, 
when transforming data originally in normal order, that 
portion of the data can be subdivided into 7ty independent 
sections for further transform steps. 

For the n = 2 m case, the reverse hi nary- to-binary per- 
mutation can also be organized to reduce time loss through 
memory overlay. In this case, the permutation is done in 
several steps. In the first step, entries in the lower and 
upper halves of the complex vector are interchanged so 
that each entry is then in its correct half, In the second 
step, entries within each half are interchanged so that each 
entry is in its correct quarter. This process continues until 
the desired permutation has been completed. An example 
for n = 2 4 is given in Figure 5. In the first step, single 
entries in each vector are interchanged, and in the second 
step, pairs of entries are interchanged. For larger n, the 
process continues with entries interchanged 2 h ~ l at a time 
in the kth step. For n = 2 m , [m/2] steps are required, and 
at each step half the entries in each vector are interchanged 
and the other half remain fixed. On the B5500 computer, 
this permutation method can be speeded up by doing the 
block memory interchanges in character rather than word 
mode. Similar speed-ups can be achieved on other com- 
puters with block-memory transfer instructions. 

The permutation procedure has a partial generalization 
in the mixed radix case. If we factor n by arranging pairs of 
identical factors symmetrically about the factors of the 
square-free portion of n, we can then do one reordering 
step for each pair of identical factors,- using a simple se- 
quence of pair interchanges. After the first step, each result 
entry is in the correct section of length n/n x . After the 
fcth step, each result entry is in the correct section of length 
n/(n t X n 2 X • • • n k ). If the square-free portion of n con- 
tains at most one factor, the reordering is completed by 
this procedure. Otherwise a final step is required in which 
we compute the permutation cycles of the square-free 
portion of n and use this schedule to complete the re- 
ordering. 

Scaling for Fixed -Point Arithmetic 

The fast Fourier transform can be computed with good 
accuracy, using fixed-point arithmetic. On a computer 
without floating-point hardware, the saving in time can be 
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large. When a Fortran subroutine for the 8DS 930 com- 
puter was translated to a fixed-point machine language 
program, computing time was reduced by a factor of about 
ten. In the scaling used, the binary point is assumed to be 
just to the right, of the sign bit, and the initial complex 
data values and all intermediate results are scaled to be 
less than $ in magnitude. A stored table of sine values is 
used with sin (tt/2) set at the largest positive value less 
than one. 

Assuming that the complex data points Xj initially have 
magnitude less than one, we scale by an additional factor 
two. Thus 

< i for; - 0, 1, - -mi - h 

\z$ ± x k exp (id)\ < 1 

in the computation of each elementary 2X2 transform. 
If we then scale each intermediate insult by £ before 
storing, which can be done by shifting right by one place, 
the values used in computing the second step of the trans- 
form are again <\ in magnitude. Continuing in this way, 
we scale each result but the final one by j, giving an over- 
all scale factor of l/n for a transform of n — 2 1 " data 
values. The scaling is gradual, and satisfactory accuracy is 
preserved. 

This scaling is easily generalized to the mixed radix case. 
Assuming that initially 

< 1 for; = 0, 1, "-,71 - 1 

we scale the data before each step by l/n* , where is the 
factor of n used in the fcth step of the transform. The 
overall scaling is again l/n. 

Conclusion 

We have considered several techniques associated with 
computing the fast Fourier transform. First, we proposed 
ways of reducing time spent in computing trigonometric 
function values. One approach is to compute a tabic of 
n/4 + 1 sine values using an interpolation method. 
Another approach is to compute the values as needed in the 
transform calculation, using a difference equation extrapo- 
lation method. For the case of traiisfonrung the even- and 
odd-numbered entries of a set of n => 2 ro real data points, 
it is shown that there is an advantage in permuting the 
data to reverse binary order before the transform rather 
than computing the transform and then reordering after- 
ward. 

In presenting his paper [8] at the 1966 Fall Joint Com- 
puter Conference, Gentleman included some discussion of 
the impracticality of making use of virtual memory to 
compute the fast Fourier transform in problems exceeding 
core storage capacity. If we transform with the Sande ver- 
sion of the algorithm and permute in the usual way, his 
view is correct. However we have shown here that it is 
possible to rearrange both the transform and permutation 
to make practical computing large transforms on a virtual 
memory system. 
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Finally, we give a scaling for computing the fast trans- 
form with fixed -point arithmetic. The fast transform is well 
vsuited to fixed-point computing, and this approach should 
be considered if a significant reduction in computing time 
can be gained. 

Acknowledgments* The author thanks Dr, Samuel 
Scheclvter for helpful discussions of methods of computing 
trigonometric function values and Dr, Harold Stone for 
his comments and suggestions for revision of this paper. 

Received November, 1966; revised July, 1067 

REFERENCES 

1. Hamming, R. W. Numerical Methods for Scientists and 

Engineers. McGraw Hill Book Company, Inc., New York, 
1962. 

2. Goertzel, G. An algorithm for the evaluation of finite 

trigonometric series. Am, Math. Monthly 65 (Jan. 1958), 34- 
35. 

3. Cooley, J. W., Lewis, P. A. W., and Welch, P. D. Histori- 

cal notes on the fast Fourier transform. IEEE Trans. Audio 
Electroacoustics AU-16, 2 (June 1967), 76-79. 

4. Danielbon, G. C, and Lanczos, C. Some improvements in 

practical Fourier analysis and their application to X-ray 
scattering from liquids. J. Franklin Inst. 233 (1942), 365- 
330; 435-452. 

5. Rudnick, P. Note on the calculation of Fourier series. Math, 

Comput, 20, 95 (July 1966), 429-430. 

6. Good, I. J. The interaction algorithm and practical Fourier 

series. J. Roy. Statist. Soc. Ser. B, SO (1958), 361-372; Adden- 
dum, n (I960), 372-375. 

7. Cooley, J. W., and Tukey, J. W. An algorithm for the 

machine calculation of complex Fourier series. Math. Corn- 
put. t9 t 90 (April 1965), 297-301. 

8. Gentleman, W. M., and Sande, G. Fast Fourier trans- 

forms—for fun and profit. Proc. AFIPS 1966 Fall Joint 
Comput. Conf., Vol. 29, pp. 663-578, 

9. Singleton, R. C. An ALGOL convolution procedure based 

on the fast Fourier transform. SRI Project 181531-132, 
Stanford Res. Inst., Menlo Park, Calif., Jan. 1967, Defense 
Doc. Ctr. AD-646 628. 

10. Cooley, J. W. Harmonic analysis complex Fourier series. 

SHARE Program library No. SDA 3425, Feb. 7, 1966. 

11. Bingham, C., Godfrey, M. D., and Tukey, J. W. Modern 

techniques of power spectral estimation. IEEE Trans. 
Audio Electroacoustics AU-t6, 2 (June 1967), 56-66. 

12. Singleton, R. C. A method for computing the fast Fourier 

transform with auxiliary memory and limited high-speed 
storage. IEEE Trans. Audio Electroacoustics AU-15, 2 
(June 1967) 91-98. 

13. Singleton, R. C. ALGOL procedures for the fast Fourier 

transform. SRI Project 181531-132, Stanford Res. Inst,, 
Menlo Park, Calif., Nov. 1966, Defense Doc. Ctr. AD-643 
996. 

14. Singleton, R. C. An ALGOL procedure for the fast Fourier 

transform with arbitrary factors. SRI Project 181531-132, 
Stanford Res. Inst., Menlo Park, Calif., Dec. 1966, Defense 
Doc. Ctr. AD-643 997. 

Volume 10 / Number 10 / October, 1967 



